python爬取疫情数据并存入excel中（包括国内各省份，全球，国内外历史疫情数据）

2023-08-11 02:43| 来源: 网络整理| 查看: 265

流程

1.进入获取疫情的url

例如：腾讯新闻的疫情网站 https://news.qq.com/zt2020/page/feiyan.htm#/ 网易新闻：https://wp.m.163.com/163/page/news/virus_report/index.html?nw=1&anw=1 在这里插入图片描述

只需要找到网站的url以及user-agent后，进入url查看json数据格式，按照步骤即可访问。 2.为了避免反爬，伪装成浏览器：找到headers = {‘user-agent’ : ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55’} ，进行浏览器访问。 3.分析url，找到数据存放的规律 4.进行数据读取和存储

爬取全球最新疫情数据 import requests #爬取网页 import json #json文件可以通过角标索引读取内容爬取json文件 import xlwings as xw #导入excel url = 'https://c.m.163.com/ug/api/wuhan/app/data/list-total?t=329822670771' #请求URL headers = {'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55'} #浏览器访问 response = requests.get(url , headers = headers) #print(response.status_code) #200表示访问成功 #print(response.json()) # 打印内容 wb = xw.Book() #相当于打开excel操作 sht = wb.sheets('sheet1') #相当于在excel里加了一个工作表 sht.range('A1').values = '地区' sht.range('B1').values = '新增确诊' sht.range('C1').values = '累计确诊' sht.range('D1').values = '死亡' sht.range('E1').values = '治愈' sht.range('F1').values = '日期'

在进入url分析数据格式后，将数据取出放入excel中。

json_data = response.json()['data']['areaTree'] #print(json_data) for i in range(206): earth_data = json_data[i] #print(earth_data) name = earth_data['name'] sht.range(f'A{i+2}').value = name today_confirm = json.dumps(earth_data['today']['confirm']) sht.range(f'B{i+2}').value = today_confirm total_confirm = json.dumps(earth_data['total']['confirm']) sht.range(f'C{i+2}').value = total_confirm total_dead = json.dumps(earth_data['total']['dead']) sht.range(f'D{i+2}').value = total_dead total_heal = json.dumps(earth_data['total']['heal']) sht.range(f'E{i+2}').value = total_heal date = earth_data['lastUpdateTime'] sht.range(f'F{i+2}').value = date #print("地区:"+name, "新增确诊:"+today_confirm, "累计确诊:"+total_confirm , "死亡"+total_dead,"治愈"+total_heal)

运行结果：在这里插入图片描述

同理，爬取中国疫情历史数据 import requests #爬取网页 import json #爬取数据 import xlwings as xw #导入excel url = 'https://c.m.163.com/ug/api/wuhan/app/data/list-total?t=329822670771' headers = {'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55'} response = requests.get(url , headers = headers) #print(response.status_code) #200表示访问成功 #print(response.json()) # 打印内容 wb = xw.Book() #相当于打开excel操作 sht = wb.sheets('sheet1') #相当于在excel里加了一个工作表 sht.range('A1').values = '地区' sht.range('B1').values = '新增确诊' sht.range('C1').values = '累计确诊' sht.range('D1').values = '死亡' sht.range('E1').values = '治愈' sht.range('F1').values = '日期' json_data = response.json()['data']['chinaDayList'] #print(json_data) for i in range(59): earth_data = json_data[i] #print(earth_data) #name = earth_data['name'] #sht.range(f'A{i+2}').value = name today_confirm = json.dumps(earth_data['today']['confirm']) sht.range(f'B{i+2}').value = today_confirm total_confirm = json.dumps(earth_data['total']['confirm']) sht.range(f'C{i+2}').value = total_confirm total_dead = json.dumps(earth_data['total']['dead']) sht.range(f'D{i+2}').value = total_dead total_heal = json.dumps(earth_data['total']['heal']) sht.range(f'E{i+2}').value = total_heal date = earth_data['date'] sht.range(f'F{i+2}').value = date #print("地区:"+name, "新增确诊:"+today_confirm, "累计确诊:"+total_confirm , "死亡"+total_dead,"治愈"+total_heal)

运行结果：在这里插入图片描述

同理，爬取美国2020-2022年疫情历史数据 import requests #爬取网页 import json #爬取数据 import xlwings as xw #导入excel url = 'https://c.m.163.com/ug/api/wuhan/app/data/list-by-area-code?areaCode=7&t=1649117007316' headers = {'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55'} response = requests.get(url , headers = headers) wb = xw.Book() #相当于打开excel操作 sht = wb.sheets('sheet1') #相当于在excel里加了一个工作表 sht.range('A1').values = '地区' sht.range('B1').values = '新增确诊' sht.range('C1').values = '累计确诊' sht.range('D1').values = '死亡' sht.range('E1').values = '治愈' sht.range('F1').values = '日期' json_data = response.json()['data']['list'] #print(json_data) for i in range(772): earth_data = json_data[i] #print(earth_data) #name = earth_data['name'] #sht.range(f'A{i+2}').value = name today_confirm = json.dumps(earth_data['today']['confirm']) sht.range(f'B{i+2}').value = today_confirm total_confirm = json.dumps(earth_data['total']['confirm']) sht.range(f'C{i+2}').value = total_confirm total_dead = json.dumps(earth_data['total']['dead']) sht.range(f'D{i+2}').value = total_dead total_heal = json.dumps(earth_data['total']['heal']) sht.range(f'E{i+2}').value = total_heal date = earth_data['date'] sht.range(f'F{i+2}').value = date

运行结果：在这里插入图片描述

爬取国内各省份疫情最新数据 import pandas as pd import requests import json def get_data(): url = 'https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5' area = requests.get(url).json() data = json.loads(area['data']) update_time = data['lastUpdateTime'] all_counties = data['areaTree'] all_list = [] for country_data in all_counties: if country_data['name'] != '中国': continue else: all_provinces = country_data['children'] for province_data in all_provinces: province_name = province_data['name'] all_cities = province_data['children'] for city_data in all_cities: city_name = city_data['name'] city_total = city_data['total'] province_result = {'province': province_name, 'city': city_name,'update_time': update_time} province_result.update(city_total) all_list.append(province_result) df = pd.DataFrame(all_list) df.to_csv('data.csv', index=False,encoding="utf_8_sig") get_data()

运行结果：在这里插入图片描述

【本文地址】

公司简介

联系我们